Main protein isoforms and reference predictions agreement
Figure 1: The percentage of genes in which predicted reference isoforms coincided with proteomics main isoforms.
Selecting the splice variant that best represents a coding gene is a crucial first step in many experimental analyses, and vital for mapping clinically relevant variants.
To determine which method is best (APPRIS principal, MANE Select transcript, longest isoform or transcirpt expression) for selecting biological important reference splice variants for large-scale analyses.
Now appris.bioinfo.cnio.es contains the list of splice variants where APPRIS and MANE agree
GENCODE v37 gene set: Provided APPRIS principal isoforms, MANE (Matched Annotation from NCBI) Select transcripts, and longest isoforms/CDS
RNA-seq from Human Protein Atlas: data from 36 dierent human tissues was leveraged with QSplice (our in-house method to quantify splice-junctions) and RSEM transcript counts.
Large-scale proteomics: Five datasets covering 52 distinct tissue types were analysed with Comet and Percolator. We used peptide-spectrum match counts to determine main proteomics isoforms.
Germline variants: for all sets of exons we calculated the NS/Syn ratios for both rare and common allele frequencies using variants from the 2504 individuals in the 1000 Genomes Project phase 3.
Figure 1: The percentage of genes in which predicted reference isoforms coincided with proteomics main isoforms.
Figure 2: APPRIS and MANE chose the same transcript. The RNA-Seq transcript (ENST00000674748) is missing the first coding exon; the longest isoform has a different 3’ CDS (in yellow). APPRIS and MANE both select the highly conserved 5’ exon transcript as the most important splice variant.
Figure 3: Ns/Syn ratios for rare (dark bars) and common (light bars) variants (95% CI) for reference and alternative exons. The number of exons in each set is indicated above. Exons “L” are those that produce the longest isoform, “M” are present in MANE Select transcripts, and “P” represent APPRIS principal isoforms.
The main cellular isoform is best described by APPRIS principal isoforms and MANE Select transcripts, and these methods are particularly powerful when they agree.
Researchers should use these 2 sets of reference transcripts, rather than the longest isoform, in all biomedical research.